{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# (beta) Dynamic Quantization on an LSTM Word Language Model\n\n**Author**: [James Reed](https://github.com/jamesr66a)\n\n**Edited by**: [Seth Weidman](https://github.com/SethHWeidman/)\n\n## Introduction\n\nQuantization involves converting the weights and activations of your model from float\nto int, which can result in smaller model size and faster inference with only a small\nhit to accuracy.\n\nIn this tutorial, we will apply the easiest form of quantization -\n[dynamic quantization](https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic) -\nto an LSTM-based next word-prediction model, closely following the\n[word language model](https://github.com/pytorch/examples/tree/master/word_language_model)\nfrom the PyTorch examples.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# imports\nimport os\nfrom io import open\nimport time\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1. Define the model\n\nHere we define the LSTM model architecture, following the\n[model](https://github.com/pytorch/examples/blob/master/word_language_model/model.py)\nfrom the word language model example.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class LSTMModel(nn.Module):\n    \"\"\"Container module with an encoder, a recurrent module, and a decoder.\"\"\"\n\n    def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5):\n        super(LSTMModel, self).__init__()\n        self.drop = nn.Dropout(dropout)\n        self.encoder = nn.Embedding(ntoken, ninp)\n        self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout)\n        self.decoder = nn.Linear(nhid, ntoken)\n\n        self.init_weights()\n\n        self.nhid = nhid\n        self.nlayers = nlayers\n\n    def init_weights(self):\n        initrange = 0.1\n        self.encoder.weight.data.uniform_(-initrange, initrange)\n        self.decoder.bias.data.zero_()\n        self.decoder.weight.data.uniform_(-initrange, initrange)\n\n    def forward(self, input, hidden):\n        emb = self.drop(self.encoder(input))\n        output, hidden = self.rnn(emb, hidden)\n        output = self.drop(output)\n        decoded = self.decoder(output)\n        return decoded, hidden\n\n    def init_hidden(self, bsz):\n        weight = next(self.parameters())\n        return (weight.new_zeros(self.nlayers, bsz, self.nhid),\n                weight.new_zeros(self.nlayers, bsz, self.nhid))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2. Load in the text data\n\nNext, we load the\n[Wikitext-2 dataset](https://www.google.com/search?q=wikitext+2+data) into a `Corpus`,\nagain following the\n[preprocessing](https://github.com/pytorch/examples/blob/master/word_language_model/data.py)\nfrom the word language model example.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class Dictionary(object):\n    def __init__(self):\n        self.word2idx = {}\n        self.idx2word = []\n\n    def add_word(self, word):\n        if word not in self.word2idx:\n            self.idx2word.append(word)\n            self.word2idx[word] = len(self.idx2word) - 1\n        return self.word2idx[word]\n\n    def __len__(self):\n        return len(self.idx2word)\n\n\nclass Corpus(object):\n    def __init__(self, path):\n        self.dictionary = Dictionary()\n        self.train = self.tokenize(os.path.join(path, 'train.txt'))\n        self.valid = self.tokenize(os.path.join(path, 'valid.txt'))\n        self.test = self.tokenize(os.path.join(path, 'test.txt'))\n\n    def tokenize(self, path):\n        \"\"\"Tokenizes a text file.\"\"\"\n        assert os.path.exists(path)\n        # Add words to the dictionary\n        with open(path, 'r', encoding=\"utf8\") as f:\n            for line in f:\n                words = line.split() + ['<eos>']\n                for word in words:\n                    self.dictionary.add_word(word)\n\n        # Tokenize file content\n        with open(path, 'r', encoding=\"utf8\") as f:\n            idss = []\n            for line in f:\n                words = line.split() + ['<eos>']\n                ids = []\n                for word in words:\n                    ids.append(self.dictionary.word2idx[word])\n                idss.append(torch.tensor(ids).type(torch.int64))\n            ids = torch.cat(idss)\n\n        return ids\n\nmodel_data_filepath = 'data/'\n\ncorpus = Corpus(model_data_filepath + 'wikitext-2')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3. Load the pre-trained model\n\nThis is a tutorial on dynamic quantization, a quantization technique\nthat is applied after a model has been trained. Therefore, we'll simply load some\npre-trained weights into this model architecture; these weights were obtained\nby training for five epochs using the default settings in the word language model\nexample.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ntokens = len(corpus.dictionary)\n\nmodel = LSTMModel(\n    ntoken = ntokens,\n    ninp = 512,\n    nhid = 256,\n    nlayers = 5,\n)\n\nmodel.load_state_dict(\n    torch.load(\n        model_data_filepath + 'word_language_model_quantize.pth',\n        map_location=torch.device('cpu')\n        )\n    )\n\nmodel.eval()\nprint(model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now let's generate some text to ensure that the pre-trained model is working\nproperly - similarly to before, we follow\n[here](https://github.com/pytorch/examples/blob/master/word_language_model/generate.py)\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "input_ = torch.randint(ntokens, (1, 1), dtype=torch.long)\nhidden = model.init_hidden(1)\ntemperature = 1.0\nnum_words = 1000\n\nwith open(model_data_filepath + 'out.txt', 'w') as outf:\n    with torch.no_grad():  # no tracking history\n        for i in range(num_words):\n            output, hidden = model(input_, hidden)\n            word_weights = output.squeeze().div(temperature).exp().cpu()\n            word_idx = torch.multinomial(word_weights, 1)[0]\n            input_.fill_(word_idx)\n\n            word = corpus.dictionary.idx2word[word_idx]\n\n            outf.write(str(word.encode('utf-8')) + ('\\n' if i % 20 == 19 else ' '))\n\n            if i % 100 == 0:\n                print('| Generated {}/{} words'.format(i, 1000))\n\nwith open(model_data_filepath + 'out.txt', 'r') as outf:\n    all_output = outf.read()\n    print(all_output)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "It's no GPT-2, but it looks like the model has started to learn the structure of\nlanguage!\n\nWe're almost ready to demonstrate dynamic quantization. We just need to define a few more\nhelper functions:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "bptt = 25\ncriterion = nn.CrossEntropyLoss()\neval_batch_size = 1\n\n# create test data set\ndef batchify(data, bsz):\n    # Work out how cleanly we can divide the dataset into bsz parts.\n    nbatch = data.size(0) // bsz\n    # Trim off any extra elements that wouldn't cleanly fit (remainders).\n    data = data.narrow(0, 0, nbatch * bsz)\n    # Evenly divide the data across the bsz batches.\n    return data.view(bsz, -1).t().contiguous()\n\ntest_data = batchify(corpus.test, eval_batch_size)\n\n# Evaluation functions\ndef get_batch(source, i):\n    seq_len = min(bptt, len(source) - 1 - i)\n    data = source[i:i+seq_len]\n    target = source[i+1:i+1+seq_len].reshape(-1)\n    return data, target\n\ndef repackage_hidden(h):\n  \"\"\"Wraps hidden states in new Tensors, to detach them from their history.\"\"\"\n\n  if isinstance(h, torch.Tensor):\n      return h.detach()\n  else:\n      return tuple(repackage_hidden(v) for v in h)\n\ndef evaluate(model_, data_source):\n    # Turn on evaluation mode which disables dropout.\n    model_.eval()\n    total_loss = 0.\n    hidden = model_.init_hidden(eval_batch_size)\n    with torch.no_grad():\n        for i in range(0, data_source.size(0) - 1, bptt):\n            data, targets = get_batch(data_source, i)\n            output, hidden = model_(data, hidden)\n            hidden = repackage_hidden(hidden)\n            output_flat = output.view(-1, ntokens)\n            total_loss += len(data) * criterion(output_flat, targets).item()\n    return total_loss / (len(data_source) - 1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4. Test dynamic quantization\n\nFinally, we can call ``torch.quantization.quantize_dynamic`` on the model!\nSpecifically,\n\n- We specify that we want the ``nn.LSTM`` and ``nn.Linear`` modules in our\n  model to be quantized\n- We specify that we want weights to be converted to ``int8`` values\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch.quantization\n\nquantized_model = torch.quantization.quantize_dynamic(\n    model, {nn.LSTM, nn.Linear}, dtype=torch.qint8\n)\nprint(quantized_model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The model looks the same; how has this benefited us? First, we see a\nsignificant reduction in model size:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def print_size_of_model(model):\n    torch.save(model.state_dict(), \"temp.p\")\n    print('Size (MB):', os.path.getsize(\"temp.p\")/1e6)\n    os.remove('temp.p')\n\nprint_size_of_model(model)\nprint_size_of_model(quantized_model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Second, we see faster inference time, with no difference in evaluation loss:\n\nNote: we set the number of threads to one for single threaded comparison, since quantized\nmodels run single threaded.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "torch.set_num_threads(1)\n\ndef time_model_evaluation(model, test_data):\n    s = time.time()\n    loss = evaluate(model, test_data)\n    elapsed = time.time() - s\n    print('''loss: {0:.3f}\\nelapsed time (seconds): {1:.1f}'''.format(loss, elapsed))\n\ntime_model_evaluation(model, test_data)\ntime_model_evaluation(quantized_model, test_data)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Running this locally on a MacBook Pro, without quantization, inference takes about 200 seconds,\nand with quantization it takes just about 100 seconds.\n\n## Conclusion\n\nDynamic quantization can be an easy way to reduce model size while only\nhaving a limited effect on accuracy.\n\nThanks for reading! As always, we welcome any feedback, so please create an issue\n[here](https://github.com/pytorch/pytorch/issues) if you have any.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}